Skip to content

feat(zim): add zim uploader in content manager#751

Open
hestela wants to merge 1 commit into
Crosstalk-Solutions:devfrom
hestela:feat/add-zim-uploader
Open

feat(zim): add zim uploader in content manager#751
hestela wants to merge 1 commit into
Crosstalk-Solutions:devfrom
hestela:feat/add-zim-uploader

Conversation

@hestela

@hestela hestela commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

adds a collapsible file uploader to accept zim file uploads into kiwix.

@hestela

hestela commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator Author

i've only tested a single file upload of a ~800MB file and it kinda just worked and showed up in the Information Library and content manager. needs more testing and it put in text about the file size limit being 80GB but i think thats probably not going to work well for an upload unless its on the same machine maybe. my knowledge base ingestion is broken, i think same issue as #718

@chriscrosstalk

Copy link
Copy Markdown
Collaborator

Nice work — genuinely useful feature, and the shape is right. Security is clean (path traversal check, sanitized filename, atomic .tmp rename, file-type guard all matching our established patterns). The processManually bodyparser bypass is the right call for large-file streaming.

Two small things I'd tighten before merge:

  1. The 80 GB advertised limit is optimistic with XHR upload. @uppy/xhr-upload sends the whole file in a single POST — one TCP blip and the upload starts over from zero. Would suggest either (a) dropping the advertised limit to ~20 GB and adding a note like "For best results, upload from the same machine or over a stable LAN connection. Larger files should be copied directly to the storage volume", or (b) swapping in @uppy/tus for resumable uploads (bigger change, probably a follow-up PR).

  2. response.status(500).send({ message: 'Upload failed', error: error.message }) leaks the raw error string. Same cleanup pattern as the system_update_service.getUpdateLogs nit from the rc.1/rc.2 QA reports — log the error internally, return a generic message to the client.

Re: testing — your KB ingestion issue is almost certainly the same num_ctx=2048 bug that bites dense content (same as #388/#756). PR #763 (awaiting Jake's review) fixes the root cause in OllamaService.embed(). Once that lands you should be able to validate the embed-job dispatch path.

For the testing matrix before merge, I'd want to see:

  • Large file (1+ GB) from a different LAN client (not same-machine)
  • Upload while a remote ZIM download is in flight (does the kiwix restart correctly defer?)
  • Upload of a .zim whose filename already exists locally (does rename() clobber or throw?)
  • Path traversal attempt + non-.zim attempt (security regression check)

Happy to run any of those on NOMAD3 once you feel it's ready for external testing.

@hestela hestela force-pushed the feat/add-zim-uploader branch from 4f92e53 to adb33fb Compare April 23, 2026 06:08
@hestela

hestela commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed comments.
working on some bug fixes, some related to the tests you described. Trying to make it so that if you upload one of the known zim files, it marks it as completed otherwise it wont be marked as completed unless you re-download using nomad.
WIP

@hestela hestela force-pushed the feat/add-zim-uploader branch from adb33fb to 3e147fe Compare April 24, 2026 06:45
@hestela

hestela commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

ok tested/fixed a few things, should be mostly good now, see bug info below.
heres how the uploader behaves:

  • for wikipedia only, if you upload a bigger or smaller zim, then the old one will get deleted and the new one will be the one marked as installed. if you delete the zim, then wikipedia gets updated to not installed. this is how the managed downloader works.
  • Duplicate file names are rejected. ie, if i already have devdocs_en_node_2026-02.zim then i get rejected if i try to upload it again.
  • If you upload say 3/5 of "Computing & Technology" Essential, then it still says its not installed but if you click to install Essential, then only the missing 2 files are downloaded. similar if you download Standard. Similarly, if you upload 3/5 and then later upload the missing 2 files, the collection is marked as installed.
  • deleting a zim file from a collection marks it as no longer installed. but if you go in and upload the missing one(s) then its marked as installed.
  • 5 file uploads are allowed (hard coded limit, could be changed). uploads done in serial. 2GB on same network (different hosts) takes ~15. from a sata 3 drive to a zfs with 4 sata ssds and 10G network.

bug, not sure how to handle this one:
start a download of 2GB wikipedia for example, it will let you upload the same file on the side and then the downloaded file will just overwrite the uploaded one. i asked claude to fix this but it suggested just blocking all uploads while any download is in progress.
Before i tried that fix, it would let me upload other zim files during a download without issue, so i could see it either way.
question is: should we only allow 1 upload/download at a time or let the user go crazy? maybe could get it to only block suspected dupes. otherwise this should be good for more testing. its otherwise pretty solid besides this bug. i didnt push the fix for this bug in the latest force push

@hestela hestela force-pushed the feat/add-zim-uploader branch from 3e147fe to 15fc57c Compare April 29, 2026 04:54
@hestela

hestela commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

resolved conflict w/force-push. did a quick test by deleting and then re-uploading the same zim file

@chriscrosstalk

Copy link
Copy Markdown
Collaborator

Go ahead and build in the inability to upload anything when there are any current downloads happening. I think that's a good remediation to any issues.

@hestela hestela force-pushed the feat/add-zim-uploader branch from 15fc57c to b9448a0 Compare May 1, 2026 03:18
@hestela

hestela commented May 1, 2026

Copy link
Copy Markdown
Collaborator Author

@chriscrosstalk , done. i just tested it. i ran a download in Content Explorer and while that happened i repeatedly uploaded the ~300MB and then the 2GB wikipedia and it didn't care it let me keep switching between the two on the side. but even if you were to do an upload/download of same name, then the latest one will get to be the "winner" ill say.

@chriscrosstalk

Copy link
Copy Markdown
Collaborator

Thanks @hestela, this is still a feature we want. Two things before it can move forward:

  1. The branch has drifted out of sync and now conflicts with dev. Could you rebase onto current dev?
  2. Since you opened this, Kiwix gained native library-mode plus a "Rescan Library" button (feat(zim): "Rescan Library" button for sideloaded ZIM files #967). The uploader should slot into that flow rather than manage the library separately, so it's worth reconciling against that to avoid duplicating the rescan/library handling.

Once it's rebased and reconciled we'll give it a full review.

@hestela hestela force-pushed the feat/add-zim-uploader branch from 6271bee to a7318ff Compare June 14, 2026 05:13
adds a collapsible file uploader to accept zim file uploads into kiwix.
@hestela hestela force-pushed the feat/add-zim-uploader branch from a7318ff to e955dc1 Compare June 14, 2026 05:26
@hestela

hestela commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

@chriscrosstalk , addressed your comments and rebased. tested by uploading a few different wikipedia tiers. and that logic still works (where you can switch tiers by uploading a different zim)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants